04. Collecting your Data
Data Guide
A simple training dataset has been provided in this project's repository. This dataset will allow you to verify that you're segmentation network is semi-functional. However, if you're interested in improving your score, you may want to collect additional training data. To do it, please see the following steps.
The data directory is organized as follows:
data/runs - contains the results of prediction runs
data/train/images - contains images for the training set
data/train/masks - contains masked (labeled) images for the training set
data/validation/images - contains images for the validation set
data/validation/masks - contains masked (labeled) images for the validation set
data/weights - contains trained TensorFlow models
data/raw_sim_data/train/run1
data/raw_sim_data/validation/run1
Training Set
- Run QuadSim
- Click the
DL Training
button - Set patrol points, path points, and spawn points as taught below in the next lesson
Data Collection Guide
- With the simulator running, press "r" to begin recording.
- In the file selection menu navigate to the
data/raw_sim_data/train/run1
directory - Optional: to speed up data collection, press "9" (1-9 will slow down collection speed)
- When you have finished collecting data, hit "r" to stop recording.
- To reset the simulator, hit "
<esc>
" - To collect multiple runs create directories
data/raw_sim_data/train/run2
,data/raw_sim_data/train/run3
and repeat the above steps.
Validation Set
To collect the validation set, repeat both sets of steps above, except using the directory data/raw_sim_data/validation
instead rather than data/raw_sim_data/train
.
Image Preprocessing
Before the network is trained, the images first need to undergo a preprocessing step. The preprocessing step transforms the depth masks from the sim, into binary masks suitable for training a neural network. It also converts the images from .png to .jpeg to create a reduced sized dataset, suitable for uploading to Udacity Workspace or the AWS Instance.
To run preprocessing:
$ python preprocess_ims.py
Note: If your data is stored as suggested in the steps above, this script should run without error.
Important Note 1:
Running preprocess_ims.py
does not delete files in the processed_data folder. This means if you leave images in processed data and collect a new dataset, some of the data in processed_data will be overwritten some will be left as is. It is recommended to delete the train and validation folders inside processed_data(or the entire folder) before running preprocess_ims.py
with a new set of collected data.
Important Note 2:
The notebook, and supporting code assume your data for training/validation is in data/train, and data/validation. After you run preprocess_ims.py
you will have new train
, and possibly validation
folders in the processed_ims
.
Rename or move data/train
, and data/validation
, then move data/processed_ims/train
, into data/
, and data/processed_ims/validation
also into data/
Important Note 3:
Merging multiple train
or validation
may be difficult, it is recommended that data choices be determined by what you include in raw_sim_data/train/run1
with possibly many different runs in the directory. You can create a tempory folder in data/
and store raw run data you don't currently want to use, but that may be useful for later. Choose which run_x
folders to include in raw_sim_data/train
, and raw_sim_data/validation
, then run preprocess_ims.py
from within the 'code/' directory to generate your new training and validation sets.